Reducing Memory Latency by Improving Resource Utilization

نویسنده

  • Marius Grannæs
چکیده

Integrated circuits have been in constant progression since the first prototype in 1958, with the semiconductor industry maintaining a constant rate of miniaturisation of transistors and wires. Up until about the year 2002, processor performance increased by about 55% per year. Since then, limitations on power, ILP and memory latency have slowed the increase in uniprocessor performance to about 20% per year. Although the capacity of DRAM increases by about 40% per year, the latency only decreases by about 6 – 7% per year. This performance gap between the processor and DRAM leads to a problem known as the memory wall. This thesis aims to improve system memory latency by leveraging available resources with excess capacity. This has been achieved through multiple techniques, but mainly by using excess bandwidth and improving scheduling policies. The first approach presented, destructive read DRAM, changes the underlying assumptions about the contents of a DRAM cell being unchanged after a read. The latency of a read is reduced, but the rest of the memory system requires changes to conserve data. Prefetching predicts what data is needed in the future and fetches that data into the cache before it is referenced. This dissertation presents a technique for generating highly accurate prefetches with good timeliness called Delta Correlating Prediction Tables (DCPT). DCPT uses a table indexed by the load’s address to store the delta history of individual loads. Delta correlation is then used to predict future misses. Delta Correlating Prediction Tables with Partial Matching (DCPT-P) extends DCPT by introducing L1 hoisting which moves data from the L2 to the L1 to further increase performance. In addition, DCPT-P leverages partial matching which reduces the spatial resolution of deltas to expose more patterns. The interaction between the memory controller and the prefetcher is especially important, because of the complex 3D structure of modern DRAM. Utilizing open pages can increase the performance of the system significantly. Memory controllers can increase bandwidth utilization and reduce latency at the same time by scheduling prefetches such that the number of page hits are maximized. The interaction between the program, prefetcher and the memory controller is explored. This thesis examines the impact of having a shared memory system in a CMP. When resources are shared, one core might interfere with another core’s execution by delaying memory requests or displacing useful data in the cache. This effect is quantified and which components are most prone to interference between cores identified. Finally, we present a framework for measuring interference at runtime.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accounting for Memory Use , Cost , Throughput , and Latency inthe Design of a Media

Conventional wisdom holds that reducing disk latency leads to higher disk utilization, maximizing disk utilization leads to higher throughput, employing a faster disk leads to better performance. All of this is true when building a conventional le or database system. In this paper we show that these principles can be misleading when applied to the design a media server. We examine a number of t...

متن کامل

Accounting for Memory Use , Cost , Throughput , and Latency

Conventional wisdom holds that reducing disk latency leads to higher disk utilization, maximizing disk utilization leads to higher throughput, employing a faster disk leads to better performance. All of this is true when building a conventional le or database system. In this paper we show that these principles can be misleading when applied to a media server. We examine a number of techniques t...

متن کامل

Bandwidth and Delay Optimization by Integrating of Software Trust Estimator with Multi-User Cloud Resource Competence

Trust Establishment is one of the significant resources to enhance the scalability and reliability of resources in the cloud environment. To establish a novel trust model on SaaS (Software as a Service) cloud resources and to optimize the resource utilization of multiple user requests, an integrated software trust estimator with multi-user resource competence (IST-MRC) optimization mechanism is...

متن کامل

Improving server utilization using fast virtual machine migration

Live virtual machine (VM) migration is a technique for transferring an active VM from one physical host to another without disrupting the VM. In principle, live VM migration enables dynamic resource requirements to be matched with available physical resources, leading to better performance and reduced energy consumption. However, in practice, the resource consumption and latency of live VM migr...

متن کامل

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the oth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010